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Abstract 

The article reviews different definitions for a convolutional code which can be found 
in the literature. The algebraic differences between the definitions are worked out in 
detail. It is shown that bi-infinite support systems are dual to finite-support systems 
under Pontryagin duality. In this duality the dual of a controllable system is observable 
and vice versa. Uncontrollability can occur only if there are bi-infinite support trajec- 
tories in the behavior, so finite and half-infinite-support systems must be controllable. 
Unobservability can occur only if there are finite support trajectories in the behavior, 
so bi-infinite and half-infinite-support systems must be observable. It is shown that 
the different definitions for convolutional codes are equivalent if one restricts attention 
to controllable and observable codes. 

Keywords: Convolutional codes, linear time-invariant systems, behavioral system 
theory. 

1 Introduction 

It is common knowledge that there is a close connection between linear systems over finite 
fields and convolutional codes. In the literature one finds however a multitude of definitions 
for convolutional codes, which can make it confusing for somebody who wants to enter this 
research field with a background in systems theory or symbolic dynamics. It is the purpose 
of this article to provide a survey of the different points of view about convolutional codes. 
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was a guest professor at EPFL in Switzerland. The author would like to thank EPFL for its support and 
hospitality. 



The article is structured as follow: In Section ^| we will review the way convolutional 
codes have often been defined in the coding literature j20|, ^Ij, |3^ . 

Section previews a definition for convolutional codes that can be found in the literature on 



symbolic dynamics. From the symbolic dynamics point of view [ 24] , |29| , |32| , a convolutional 
code is a linear irreducible shift space. 

In Section [| we will review the class of time-invariant, complete linear behaviors in the 
sense of Willems [p0| , |5T| , p>2] |. We will show how these behaviors relate to the definitions 
given in Section |2| and ^. 

In Section [5] we will give a definition for convolutional codes in which it is required 
that the code words have finite support. Such a definition was considered by Fornasini and 
Valcher |)48| , |] and by the author in collaboration with Schumacher, Weiner and York [42 



43, |39|. The study of behaviors with finite support has been done earlier in the context of 
automata theory and we refer to Eilenberg's book |]J . We show in Section [| how this module- 
theoretic definition relates to complete, linear and time-invariant behaviors by Pontryagin 
duality. 

In Section |6| we will study different first-order representations connected with the dif- 
ferent viewpoints. Finally, in Section [7| we compare the different definitions. We also show 
how cyclic redundancy check codes can naturally be viewed in the context of finite-support 
convolutional codes. 

Throughout the paper we will emphasize the algebraic properties of the different defini- 
tions. We will also restrict ourselves to the concrete setting of convolutional codes defined 
over finite fields. It is however known that many of the concepts in this paper generalize 
to group codes 0, [T^, |D| and multidimensional convolutional codes j|, [^, [T^, [3SL K9|. All 
of the definitions which we are going to give are quite similar, but there are some notable 
differences. 

Since the paper draws from results from quite different research areas, one is faced with 
the problem that there is no uniform notation. In this paper we will adopt the convention 
used in systems theory in which vectors are regarded as column vectors. For the convenience 
of the reader, we conclude this section with a summary of some of the notation used in this 
paper: 
F 
F 
F 
F' 
F 
F' 
F 
Z 
Z 
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A fixed finite field; 
The polynomial ring over F; 
The Laurent polynomial ring over F; 
The field of rationals; 

The ring of formal power series of the form Yli^o a i zl 'i 
The field of formal Laurent series having the form Yli=d ' 
The ring of formal power series of the form X^-oo a i z% \ 
The integers; 
The nonnegative integers; 
Z_ The nonpositive integers. 

Consider the ring of formal power series F[[z, z^ 1 }}. We will identify the set F[[z,2 _1 ] 
with the (two-sided) sequence space F z . We have natural embeddings: 



A 

z, z 

>]] 
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z,z~ 
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F 



¥[z] — ► ¥[z,z~ 



F( 



F((z))— F[[z,z- 
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With these embeddings we can view e.g. the set of rationals ¥(z) as a subset of the sequence 
space F z , and we will make use of such identifications throughout the paper. 

The set of n- vectors with polynomial entries will be denoted by F n [z]. Similarly we 
define the sets ¥ n (z),¥ n ((z)) etc. All these sets are subsets of the two sided sequence space 
(F n ) z = F n [[z, z -1 ]]. The definitions of convolutional codes which we will provide in the next 
sections will all be F-linear subspaces of (F n ) z . 

The idea of writing a survey on the different points of view about convolutional codes 
was suggested to the author by Paul Fuhrmann during a stimulating workshop on "Codes, 
Systems and Graphical Models" at the Institute for Mathematics and its Applications (IMA) 
in August 1999. A first draft of this paper was circulated in October 1999 to about a dozen 
people interested in these research issues. This generated an interesting 'Internet discussion' 
on these issues, in which the different opinions were exchanged by e-mail. Some of these 
ideas have been incorporated into the final version of the paper and the author would like 
to thank Dave Forney, Paul Fuhrmann, Heide Gluesing-Luerssen, Jan Willems and Sandro 
Zampieri for having provided valuable thoughts. The author wishes also to thank the IMA 
and its superb staff, who made the above mentioned workshop possible. 

2 The linear algebra point of view 

The theory of convolutional codes grew out and extended the theory of linear block codes 
into a new direction. Because of this reason we start the section with linear block codes and 
we introduce convolutional codes in a quite intuitive way. 

An [n, k] linear block code is by definition a linear subspace C C F n having dimension 
dimC = k. Let G be a n x k matrix with entries in F. The linear map 

(p : F fe — > F n , m i — >c = Gm 

is called an encoding map for the code C if im (<p) = C. If this is the case then we say G is a 
generator matrix or an encoder for the block code C. 

Assume that a sequence of message blocks m , . . . ,m t C F k should be encoded into a 
corresponding sequence of code words q = Gm,i e F n , % = 0,... ,t. By introducing the 
polynomial vectors m(z) = Y^i=o m i z% e ¥ k [z] and c(z) = Ylii=o c i z% e F n [z] it is possible to 
describe the encoding procedure through the module homomorphism:[] 

<p : ¥ k [z] — ► ¥ n [z], m{z) i — ► c(z) = Gm{z). (2.1) 

The original idea of a convolutional code goes back to the paper of Elias [|| , where it was 
suggested to use a polynomial matrix G(z) in the encoding procedure Q2.1|) . 

Polynomial encoders G(z) are physically easily implemented through a feedforward linear 
sequential circuit. Massey and Sain |34], |45j showed that there is a close connection between 
linear systems and convolutional codes. Massey and Sain viewed the polynomial encoder 

throughout the paper we use the symbol ip to denote an encoding map. The context will make it clear 
what the domain and the range of this map is in each situation. 
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G{z) as a transfer function. More generally it is possible to realize a transfer function G(z) 
with rational entries by (see e.g. ^]) a linear sequential circuit whose elements include 
feedback components. If one allows rational entries in the encoding matrix then it seems 
natural to extend the possible message sequences to the set of rational vectors m(z) G ¥ k (z) 
and to process this sequence by a 'rational encoder' resulting again in a rational code vector 
c(z) G ¥ n (z). With this we have a first definition of a convolutional code as it can be found 
e.g. in the Handbook of Coding Theory P5| , Definition 2.4]: 

Definition A A F(z)-linear subspace C of ¥ n (z) is called a convolutional code. 

If G(z) is a n x k matrix with entries in ¥(z) whose columns form a basis for C, then we 
call G(z) a generator matrix or an encoder for the convolutional code C. G(z) describes the 
encoding map: 

<p : ¥ k (z) — ► ¥ n (z), m(z) i — ► c(z) = G(z)m(z). 

The field of rationals ¥(z) viewed as a subset of the sequence space F z = ¥[[z, z -1 ]] consists 
precisely of those sequences whose support is finite on the negative sequence space ¥ z ~ and 
whose elements form an ultimately periodic sequence on the positive sequence space ¥ z+ . It 
therefore seems that one equally well could restrict the possible message words m(z) G ¥ k (z) 
to sequences whose coordinates consists of Laurent polynomials only, in other words to 
sequences of the form m(z) G ¥ k [z, z^ 1 }. 

Alternatively one could allow message words m(z) whose coordinates are not ultimately 
periodic and possibly not of finite support on the negative sequence space F z_ . This would 
suggest that one should take as possible message words the whole sequence space (¥ k y = 
¥ h [[z, z~ l \\. The problem with this approach is that the multiplication of an element in 
¥[[z, z -1 }} with an element in ¥(z) is in general not well defined. If one restricts however 
the message sequences to the field of formal Laurent series then the multiplication is well 
defined. This leads to the following definition which goes back to the work of Forney ]7). 



The definition has been adopted in the book by Piret [38| and the book by Johannesson and 



Zigangirov |2~I|, and it appears as Definition 2.3 in the Handbook of Coding Theory 

Definition A' A F((z))-linear subspace C of ¥ n ((z)) which has a basis of rational vectors 
in ¥ n (z) is called a convolutional code. 

The requirement that C has a basis with rational entries guarantees that C has also a 
basis with only polynomial entries. C can therefore be represented by a n x k generator 
matrix G(z) whose entries consist only of rationals or even even polynomials. The encoding 
map with respect to G(z) is given through: 

<p : ¥ k {{z)) — ► ¥ n ((z)), m{z) i — ► c{z) = G{z)m{z). (2.2) 

If G(z) is a polynomial matrix, then finitely many components of m(z) influence only 
finitely many components of c(z), and the encoding procedure may be physically imple- 
mented by a simple feedforward linear shift register. 

If G(z) contains rational entries, then it is in general the case that a finite (polynomial) 
message vector is encoded into an infinite (rational) code vector of the form c(z) = ^2'*L S CiZ 1 . 
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This might cause some difficulties in the decoder. For the encoding process, G(z) can be 
physically realized by linear shift registers, in general with feedback (see e.g. |20, |21|j). 



From a systems theory point of view, it is classical |23| to view the encoding map ( p.2|) as 
an input-output linear system. This was the point of view taken by Massey and Sain |34[ f|5| 
and thereafter in most of the coding literature. However unlike in systems theory, the 
important object in coding theory is the code C = im(yj). As a result one calls encoders tp 
which generate the same image im ((p) equivalent; we will say more about this in a moment. 
In Sections ^] and f| we will view ( |2.2| ) as an image representation of a time-invariant behavior 
in the sense of Willems [|50[ j5l| , which we believe captures the coding situation in a more 



natural way. 

Assume that G(z) and G(z) are two n x k rational encoding matrices defining the same 
code C with respect to either Definition [A| or [A]. In this case we say that G(z) and G(z) are 
equivalent encoders. The following lemma is a simple result of linear algebra: 

Lemma 2.1 Two nxk rational encoders G(z) and G(z) are equivalent with respect to either 
Definition ^ or |^] if and only if there is a k x k invertible rational matrix R(z) such that 
G{z) =G{z)R{z). 



It follows from this lemma that Definition |A| and Definition |A] are completely equivalent 
with respect to equivalence of encoders. 

From an algebraic point of view we can identify a convolutional code in the sense of 
Definition |A| or Definition |A] through an equivalence class of rational matrices. The following 



theorem singles out a set of very desirable encoders inside each equivalence class. 

Theorem 2.2 Let G(z) be a n x k rational encoding matrix of rank k defining a code C. 
Then there is a k x k invertible rational matrix R{z) such that G(z) = G(z)R(z) has the 
properties: 

(i) G(z) is a polynomial matrix. 

(ii) G(z) is right prime. 

(iii) G(z) is column reduced with column degrees {ex, . . . , e^}. 

Furthermore, every polynomial encoding matrix ofC which is right prime and column-reduced 
has (unordered) column degrees {ei, . . . , e^}. Thus these indices are invariants of the con- 
volutional code. 

The essence of Theorem |2.2| was proved by Forney || Theorem 3] . In || Forney related 
the indices appearing in (iii) to the controllability and observability indices of a controllable 
and observable system. Paper M had an immense impact in the linear systems theory 



literature. We will follow here the suggestion of McEliece and call these indices the 
Forney indices of the convolutional code, despite the fact that Theorem |2.2j can be traced 
back to the last century, when Kronecker, Hermite and in particular Dedekind and Weber 
studied matrices over the rationals and more general function fields. In Sections [| and |5] 
we will make a distinction between the Forney indices as defined above and the Kronecker 
indices of a submodule of ¥ n \z}. 
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In the coding literature |21], an encoder satisfying conditions (i), (ii) and (iii) of 
Theorem |2.2| is called a minimal basic encoder. 

So far we have used encoding matrices to describe a convolutional code. As is customary 
in linear algebra, one often describes a linear subspace as the kernel of a matrix. This leads 
to the notion of a parity-check matrix. The following theorem is well known (see e.g. j38f). 

Theorem 2.3 Let C C ¥ n ((z)) be a rank-k convolutional code in the sense of Definition pT|. 
Then there exists an r x n matrix H(z) such that the code is equivalently described as the 
kernel of H(z): 

C = { c(z) G F n ((z)) | H(z)c(z) = }. 
Moreover, it is possible to choose H(z) in such a way that: 

(i) H(z) is a polynomial matrix. 

(ii) H (z) is left prime. 

(iii) H(z) is row-reduced having row degrees {fi, . . . , f r }. 

Furthermore, every polynomial parity check matrix of C which is left prime and row reduced 
will have (unordered) row degrees {fx,... ,f r }. Thus these indices are invariants of the 
convolutional code. 

Properties (i)-(iii) essentially follow from the fact that the transpose H l (z) is a generator 
matrix for the dual (orthogonal) code C L . 

The set of indices {ei, . . . , ek} and {f\, . . . , f r } differ in general, their sum is however 
always the same, and is called the degree of the convolutional code. One says that a rank-A; 
code C C ¥ n ((z)) has transmission rate k/n, controller memory m := maxjei, . . . , e^} and 
observer memory n := max{/i, . . . , f r }. 

Another important code parameter is the free distance. The free distance of a code 
measures the smallest distance between any two different code words, and is formally defined 
as: 



(if re e(C) := min d H (u t , v t ) , (2.3) 
where d#( , ) denotes the usual Hamming distance on F n . 



3 The symbolic dynamics point of view 



In this section we present a definition of convolutional codes as it can be found in the 



symbolic dynamics literature [0, |29|, . Convolutional codes in this framework are exactly 
the linear, compact, irreducible and shift-invariant subsets of F n [[;z, z -1 ]]. In order to make 
this precise, we will have to develop some basic notions from symbolic dynamics. 

In the sequel we will work with the finite alphabet A := F n . A block over the alphabet A 
is a finite sequence (3 = x\x% . . . Xk consisting of k elements Xi E A. If w = w(z) = J2i w i z% 
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¥ n [[z, z~ 1 ]] is a sequence, one says that the block (3 occurs in w if there is some integer j 
such that (3 = WjWj + i . . . w^+j-i- If X C ¥ n [[z, z' 1 }} is any subset, we denote by ¥>(X) the 
set of blocks which occur in some element of X. 

The fundamental objects in symbolic dynamics are the shift spaces. For this let 5" be a 
set of blocks, possibly infinite. 

Definition 3.1 The subset X C F n [[,z, z -1 ]] consisting of all sequences w(z) which do not 
contain any of the (forbidden) blocks of 5F is called a shift space. 

The left-shift operator is the F-linear map 

a: F^z" 1 ]] — ► F^z" 1 ]], w(z) i — ► z^w^z). (3.1) 

Let I n be the n x n identity matrix. The shift map a extends to the shift map 

al n : F^,*" 1 ]] — F"^,*- 1 ]]. 

One says that X C F"[[z, z -1 ]] is a shift-invariant set if (cr/ n )(X) C X. Clearly shift spaces 
are shift-invariant subsets of F n [[z, z^ 1 ]]. 

It is possible to characterize shift spaces in a topological manner. For this we will intro- 
duce a metric on ¥ n [[z, z^ 1 ]]: 

Definition 3.2 If v(z) = *Yui v i z% an d w(z) = Yli w i zl are both elements of F n [[,z, z' 1 ]} we 
define their distance through: 



d(v(z),w(z)) := J2 2 ~ lll dH(v l ,w i ). (3.2) 



In this metric two elements v(z),w(z) are 'close' if they coincide over a 'large block around 
zero'. One readily verifies that d( , ) indeed satisfies all the properties of a metric and 
therefore induces a topology on F n [[,2, z -1 ]]. Using this topology we can characterize shift 
spaces: 

Theorem 3.3 A subset of¥ n [[z, z -1 ]] is a shift space if and only if it is shift-invariant and 
compact. 



Proof: The metric introduced in Definition |3.2| is equivalent to the metric described in [29 



Example 6.1.10]. The induced topologies are therefore the same. The result follows therefore 
from p9, Theorem 6.1.21]. □ 



The topological space F n [[z, z -1 ]] is a typical example of a linearly compact vector space, 
a notion introduced by S. Lefschetz. There is a large theory on linearly compact vector 
spaces, and several of the results which we are going to derive are valid in this broader 
context. We refer the interested reader to [^5|, §10] for more details. 

A further important concept is irreducibility which will turn out to be equivalent to the 
concept of controllability in our concrete setting. 
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Definition 3.4 A shift space X C F n [[,2, z x ]] is called irreducible if for every ordered pair 
of blocks /3,7 of B(X) there is a block \x such that the concatenated block /^/ry is in 1$(X). 

We are now prepared to give the symbolic dynamics definition for a convolutional code 
and to work out the basic properties for these codes. 

Definition B A linear, compact, irreducible and shift-invariant subset of F n [[;2, z -1 ]] is 
called a convolutional code. 

This is an abstract definition and it is not immediately clear how one should encode 
messages with such convolutional codes. The following will make this clear. 

Let G(z) be a n x k matrix with entries in the ring of Laurent polynomials ¥[z, z^ 1 ]. 
Consider the encoding map: 



tp : ¥ k [[z, z- 1 ]] — ► ¥ n [[z, z-% m{z) i— > c{z) = G{a)m{z). (3.3) 

In terms of polynomials the map (p is simply described through m(z) i — > c(z) = G(z~ 1 )m(z). 

Recall that a continuous map is called closed if the image of a closed set is closed. Using 
the fact that F n [[,2, z -1 ]] is compact, one (easily) proves the following result: 

Lemma 3.5 The encoding map (|3.3|) is ¥ -linear, continuous and closed. 



Clearly im ((f) is also shift-invariant, and one shows that the image of an irreducible set 
under (p is irreducible again. 

In summary we have shown that im (<p) describes a convolutional code in the sense of 
Definition [Bj Actually the converse is true as well: 

Theorem 3.6 C C F n [[z, z^ 1 ]] is a convolutional code in the sense of Definition [5| if and 
only if there exists a Laurent polynomial matrix G(z) such that C = im (</?), where tp is the 
map in (|3.3|). 



A proof of this theorem will be given in the next section after Theorem [O . 
The question now arises how Definition || relates to Definition |A] and Definition [A]. The 
following theorem will provide a partial answer to this question. 

Theorem 3.7 Assume that C C F n [[z, z^ 1 ]] is a nonzero convolutional code in the sense 
of Definition [3| or Definition Then C is not closed, but the closure of C of C is a 
convolutional code in the sense of Definition [B|. 

Proof: Let G(z) be a minimal basic encoder of C and let w(z) £ ¥ n [z] be the first column 
of G(z). Note that w(z) £ C and that there is at least one entry of w(z) which does not 
contain the factor (z — 1). Let 4>n(z) ■= ~52i=-N zl e F[z,z -1 ] and consider the sequence 
of code words w (z) := <Pn{z)w{z). For each N > one has that w (z) £ C. However 
lim^oo w N {z) ism¥ n [[z,z- 1 }}\¥ n {(z)) C ¥ n [[z, z' 1 }} \ C. This shows that C is not a closed 
set inside ¥ n [[z, z^ 1 ]]. The closure C is obtained by extending the input space F k ((z)) to all of 
z~ 1 ]]. The image of z -1 ]] under the encoding map (|3.3| ) is closed by Lemma 



hence the closure is a code in the sense of Definition IB]. □ 
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Actually one can show that there is a bijective correspondence between the convolutional 
codes in the sense of Definition |A] (respectively Definition and the convolutional codes 
in the sense of Definition [B], as we will show in Theorem |7.1| and Theorem |7.2| . It is also 



worthwhile to remark that already in 1983 Staiger published a paper |47| where he studied 
the closure of convolutional codes generated by a polynomial generator matrix. 
In analogy to Lemma [2.1| , one has: 

Lemma 3.8 Two n x k encoding matrices G(z) and G(z) defined over the Laurent polyno- 
mial ring ¥[z, z~ x ] are equivalent with respect to Definition ^ if and only if there is a k x k 
invertible rational matrix R{z) such that G(z) = G(z)R(z). 

We leave the proof again as an exercise for the reader. We remark that rational transfor- 
mations of the form R(z) are needed to describe the equivalence, even though it is in general 
not possible to use a rational encoder G{z) in the encoding procedure (|3.3j). This is simply 
due to the fact that in general the multiplication of an element of F(z) with an element of 
¥[[z, z~ 1 }} is not defined. The following example should make this clear. (Compare also with 



Remark 4.4.) 



Example 3.9 Consider f(z) = = YT=o z% e F 0) and g{z) = Y^Z-co^ G F [[ z >* _1 ]]- 
Trying to multiply the two power series f(z),g(z) would result in a power series in which 
each coefficient would be infinite. 

In the same way as at the end of Section [| we define the transmission rate, the degree, 
the memory and the free distance of a convolutional code C in the sense of Definition O. 



4 Linear time- invariant behaviors 

In this section we will take the point of view that a convolutional code is a linear time- 



invariant behavior in the sense of Willems [ 50 , BTl B2[ . Of course behavioral system theory is 



quite general, allowing all kinds of time axes and signal spaces. In order to relate the behav- 
ioral concepts to the previous points of view, we will restrict our study to linear behaviors 
in (W n f = ¥ n [[z,z- 1 }} and (F n ) z+ = F» [[*]]. 

Let a be the shift operator defined in (|3.1|) . One says that a subset B C F n [[;z, z^ 1 ]] is 
time-invariant if (al n ) (B) C B. The concept therefore coincides with the symbolic dynamics 
concept of shift -invariance. 

In addition to linearity and time-invariance, there is a third important concept usually 
required of a time-invariant behavior: 

Definition 4.1 A behavior B C F n [[z, z -1 ]] is said to be complete if w G F n [[z, z -1 }] belongs 
to B whenever w\j belongs to B\j for every finite subinterval Jcl 

The definition simply says that B is complete if membership can be decided on the 
basis of finite windows. Completeness is an important well behavedness property for linear 
time-invariant behaviors, as Willems |5^, p. 567] emphasized with the remark: 
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As such, it can be said that the study of non-complete systems does not fall 
within the competence of system theorists and could be left to cosmologists or 
theologians. 



In Definition |3.2| we introduced a metric on the vector space F n [[;2, z We remark that 



with respect to this metric a subset B C F n [[z, z -1 }} is complete if and only if every Cauchy 
sequence converges inside B. In other words, the completeness notion of Definition [4.1| 
coincides with the usual topological notion of completeness. 

The following result is known for linearly compact vector spaces, a proof can be found 

in 
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Lemma 4.2 A linear subset B C F n [[z, z 1 ]] is complete if and only if it is closed and hence 
compact. 

With these preliminaries we can define a convolutional code as follows: 

Definition C A linear, time-invariant and complete subset B C F n [[;?, z -1 ]] is called a 
convolutional code. 

It is immediate from Lemma [4.2| that the convolutional codes defined in Definition |B] are 
complete and that Definition |C| is more general than Definition [B], since no irreducibility 
is required. It also follows from Theorem |3.7| and Lemma |4.2j that the convolutional codes 



defined in Definition |A| and Definition |A] are in general not complete. 

Before we elaborate on these differences we would like also to treat the situation when the 
time axis is Z + since traditionally a large part of linear systems theory has been concerned 
with systems defined on the positive time axis. 

We first define the left-shift operator acting on (¥ n ) z+ = F n [[2;]] through: 

a: ¥[[z]] — >¥[[z,z- 1 }}, w(z) i — > - w(0)). (4.1) 



We have used the same symbol as in ( |3.1|) since the context will always make it clear if we 
work over Z or Z + . In analogy to Q3.1|) cr extends to the shift map al n : F n [[z]] — > F n [[z]], 
and one says a subset X C F n [[^]] is time-invariant if (aI n )(X) C X. Notice however that 
the map of (|4.1|) , unlike that of (|3.1| ), is not invertible. 



With this we have: 

Definition C A linear, time-invariant and complete subset B C F n [[z]] is called a convolu- 
tional code. 



The following fundamental theorem was proved by Willems [|5(], Theorem 5]. 

Theorem 4.3 A subset B C F n [[z, z~ l \\ (respectively a subset B C F n [[z]]j is linear, time- 
invariant and complete if and only if there is a r xn matrix P{z) having entries in ¥[z] such 
that 

B = {w(z) | P(a)w(z) = } . (4.2) 
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By Lemma |375| the linear map ip : F n [[2:, z^ 1 ]] — > ¥ n [[z, w(z) i — > P(a)w(z) is contin- 

uous and its kernel is therefore a complete set. It is therefore immediate that the behavior 

is 



defined in Q4.2Q is linear, time-invariant and complete. The harder part of Theorem [13 
the converse statement. 

Equation (|4.2|) is often referred to as a kernel (or AR) representation of a behavioral 
system. We will denote a behavior having the form ( |4.2| ) by ker -P(cx). By contrast, the 
encoding map (p defined in ( |3.3| ) describes an image (or MA) representation of the behavior 
im (ip) = imCr(cr). 

The most general representation is an ARMA representation. For this let P(z) and G(z) 
be matrices of size r x n and r x k respectively, having entries in the Laurent polynomial 
ring ¥[z, z~ 1 ]. Then 

B = { w{z) G F n [[2,2" 1 ]] | 3m(z) G ¥ k [[z, z' 1 ]] : P(a)w(z) = G(a)m(z) } (4.3) 

is called an ARMA model. One immediately verifies that the set B is linear and time- 
invariant. It is a direct consequence of Lemma |3.5| that B is also closed and hence complete. 



Theorem |4.3| therefore states that it is possible to eliminate the so called 'latent variable' 
m(z) and describe the behavior B by a simpler kernel representation of the form (|4.2j ). It 
follows in particular that the code im (y>) = imG(a) defined in ( |3.3|) has an equivalent kernel 
representation of the form (|4.2|) but that in general the converse is not true. 



Remark 4.4 As we explained in Section § it is quite common to use rational encoders for 
convolutional codes. In the ARMA model ( |4.3| ) we required that the entries of P(z) and 
G(z) be from the Laurent polynomial ring. If P(z) and G(z) were rational matrices, then 
the behavior B C F n [[z,,2 -1 ]] appearing in (|4.3|) might not be well defined, as we showed in 
Example [3.9| . On the other hand if one restricts the behavior to the positive time axis Z + , 
i.e. if one assumes that B C F n [[z]], then the set ( f4.3|) is defined even if P(z) and G(z) are 
rational encoders. This is certainly one reason why much classical system theory focused on 
shift spaces B C ¥ n [[z]} or B C ¥ n ((z)). 



In the sequel we will concentrate on representations of the form ( |4.2|) . Again the question 
arises, when are two kernel representations equivalent? 

Lemma 4.5 Two r x n matrices P(z) and P(z) defined over the Laurent polynomial ring 
F^,^" 1 ] describe the same behavior ker P '(a) = kerP(cr) C F n [[z,2"~ 1 ]] if and only if there is 
a r x r matrix U(z), unimodular over ¥[z, z~ x \, such that P{z) = U(z)P(z). 

Proof: [§2], Proposition III.3]. □ 

Similarly, if P(z) and P{z) are defined over ¥[z], then these matrices define the same behavior 
kerP(cx) = kerP(a) C F n [[^]] if and only if there is a matrix U(z), unimodular over ¥[z], 
such that P(z) = U(z)P(z). 

The major difference between Definition [FJ and Definition y seems to be that Definition y 
does not require irreducibility. This last concept corresponds to the term controllability 
(see [|10|) in systems theory. We first start with some notation taken from ||42|| : 

For a sequence w = Yl°^oo w i z% e ¥ n [[z, z~ 1 ]), we use the symbol w + to denote the 'right 
half YHo w i z% an d the symbol w~ to denote the 'left half Y^-oo w i z% - 
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Definition 4.6 A behavior B defined on Z is said to be controllable if there is some integer 
I such that for every w and w' in B and every integer j there exists a w" G B such that 
( z i W "Y = (z j w)- and f^+V) + = (z i+ V)+. 



Remark 4.7 Loeliger and Mittelhofzer [|3(J speak of strongly controllable if a behavior sat- 



isfies the conditions of Definition |4.6| . 'Weakly controllable' in contrast requires an integer I 
which may depend on the trajectories w and w'. The notions are equivalent in our concrete 
setting. 

We leave it as an exercise for the reader to show that irreducibility as introduced in 



Definition [3.4| is equivalent to controllability for linear, time-invariant and complete behav- 
iors B C F n [[,2, z^ 1 ]]. The next theorem gives equivalent conditions for a behavior to be 
controllable. 

Theorem 4.8 ( cf. Prop. 4-3]) Let P(z) be a r x n matrix of rank r defined over W[z, z^ 1 ]. 
The following conditions are equivalent: 

(i) The behavior B = kerP(cr) = {w(z) G ¥ n [[z, z" 1 }} \ P(a)w(z) = 0} is controllable. 

(ii) P(z) is left prime over ¥[z, z~ 1 ]. 

(iii) The behavior B has an image representation. This means there exists an n x k matrix 
G(z) defined over ¥[z, z~ l ] such that 

B = {w(z)E¥ n [[z,z- 1 }} | 3m(z) G ¥ k [[z, z^ 1 ]] : w{z) = G{a)m{z) }. 

Combining the theorem with the facts that completeness corresponds to compactness 
and irreducibility corresponds to controllability gives a proof of Theorem |3.6| . 

We conclude the section by defining some parameters of a linear, time-invariant and 
complete behavior. For simplicity we will do this in an algebraic manner. We will first treat 
behaviors B C F"[[;z]], i.e. behaviors in the sense of Definition C[. In Remark |4. 10| we will 



explain how the definitions have to be adjusted for behaviors defined on the time axis Z. 

Assume that P{z) is a r x n polynomial matrix of rank r defining the behavior B = 
kerP(a). There exists a matrix U(z), unimodular over ¥[z], such that P(z) = U(z)P(z) 
is row-reduced with ordered row degrees V\ > . . . > v r . The indices v = (yi, . . . , v r ) are 
invariants of the row module of P{z) (and hence also invariants of the behavior B), and are 
sometimes referred to as the Kronecker indices or observability indices of B. The invariant 
5 := Y7i=i v i i s ca hed the McMillan degree of the behavior B. If we think of B as a con- 
volutional code in the sense of Definition |C] then we say that B has transmission rate n ^ L . 



Finally, the free distance of the code is defined as in (|2.3| ). 



Remark 4.9 The Kronecker indices v are in general different from the minimal row indices 
(in the sense of Forney ||) of the F(z)-vector space generated by the rows of P{z). They 
coincide with the minimal row indices if and only if P(z) is left prime. 
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Remark 4.10 If B C F n [[z, z -1 ]] is a linear, time-invariant and complete behavior, then we 
can define parameters like the Kronecker indices and the McMillan degree in the following 
way: Assume P(z) has the property that B = ker P(cr). There exists a matrix U(z), uni- 
modular over F[z,z _1 ], such that P(z) = U(z)P(z) is row-reduced and P(0) has full row 
rank r. One shows again that the row degrees of P(z) are invariants of the behavior. The 
McMillan degree, the transmission rate and the free distance are then defined in the same 
way as for behaviors B C F n [[z]]. 



5 The module point of view 

Fornasini and Valcher |5], [0| and the present author in joint work with Schumacher, Weiner 



and York |42|, |44|, |49| proposed a module-theoretic approach to convolutional codes. The 
module point of view simplifies the algebraic treatment of convolutional codes to a large de- 
gree, and this simplification is probably almost necessary if one wants to study convolutional 
codes in a multidimensional setting || [48], [49 



From a systems theoretic point of view, the module-theoretic approach studies linear 
time-invariant systems whose states start at zero and return to zero in finite time. Such 
dynamical systems have been studied by Hinrichsen and Pratzel-Wolters |18|, [H|, who rec- 
ognized these systems as convenient objects for the study of systems equivalence. 

In our development we will again deal with the time axes Z and Z + in a parallel manner. 

Definition D A submodule C of ¥ n [z, z~ x ] is called a convolutional code. 

We like the module-theoretic language. If one prefers to define everything in terms 
of trajectories then one could equivalently define C as F-linear, time-invariant subset of 
¥ n [[z, z -1 ]] whose elements have finite support. 

The analogous definition for codes supported on the positive time axis Z + is: 

Definition D' A submodule C of ¥ n [z] is called a convolutional code. 

Since both the rings ¥[z, z~ x ] and ¥[z] are principal ideal domains (PID), a convolutional 
code C has always a well-defined rank k, and there is a full-rank matrix G(z) of rank k such 
that C = colsp F [ 2 ^-i]G(z) (respectively C = co\sp ¥ ^G(z) if C is defined as in Definition ID 7 !). 
We will call G(z) an encoder of C, and the map 

<p : ¥ k [z, z" 1 ] — ► ¥ n [z, z' 1 }, m{z) i — ► c{z) = G{z)m{z) (5.1) 

an encoding map. 

Remark 5.1 In contrast to the situation of Section [|, it is possible to define a convolutional 
code in the sense of Definition (respectively Definition [D]) using a rational encoder. For 
this, assume that G(z) is an n x k matrix with entries in ¥(z). Then 

C = { c{z) e ¥ n [z, Z - 1 ] | 3m(z) G ¥ k [z, Z - 1 } : c(z) = G{z)m{z) } 

defines a submodule of F n [z,2: _1 ]. Note that the map (|5.1|) involving a rational encoding 
matrix G(z) has to be 'input-restricted' in this case. 
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In analogy to Lemma |3.8| we have: 



Lemma 5.2 Two n x k matrices G(z) and G(z) defined over the Laurent polynomial ring 
¥[z,z~ 1 } (respectively over the polynomial ring ¥[z\) generate the same code C C F n [,z,2: _1 ] 
(respectively C C F n [2]J if and only if there is a k x k matrix U(z), unimodular over ¥[z, z^ 1 ] 
(respectively over¥[z]), such that G(z) = G(z)U(z). 

As we already mentioned earlier convolutional codes in the sense of Definitions [D| and [[7 
are linear and time-invariant. The following theorem answers any question about controlla- 
bility (i.e. irreducibility) and completeness. 



Theorem 5.3 A nonzero convolutional code with either Definition or is controllable 
and incomplete. 

Sketch of Proof: The proof of the completeness part of the Theorem is analogous to the 
proof of Theorem |3.7| . In order to show controllability, let G(z) be an encoding matrix 
for a code C C ¥ n [z] and consider two code words w(z) = G(z)(a + ai + ■ ■ ■ + a s z s ) and 
w'(z) = G(z)(bo + &i + • • • + b s z s ). The codeword w"(z) required by Definition (L6] can be 
constructed in the form 

G(z)(a + at + • • ■ + a jZ j + b j+i z j+e + ■•• + ■•• + b s z s ). 

□ 



Submodules of ¥ n [z, z' 1 ] (respectively of F n [z]) form the Pontryagin dual of linear, time- 
invariant and complete behaviors in F n [[z, z -1 ]] (respectively F n [[z]]). In the following we 
follow |42] and explain this in a very explicit way when the time axis is Z. Of course 
everything can be done mutatis mutandis when the time axis is Z + . 

Consider the bilinear form: 

(,): ¥ n [[z, z^ 1 }] x ¥ n [z, z^ 1 ] — > ¥ 

i=— oo 

where ( , ) represents the standard dot product on F ra . One shows that ( , ) is well defined 
and nondegenerate, in particular because there are only finitely many nonzero terms in the 
sum. For any subset C of ¥ n [z, z^ 1 ] one defines the annihilator 

C L = {we ¥ n [[z, z- 1 }} | (w, v) = 0, Vf G C} (5.3) 

and the annihilator of a subset B of F n [[2, z -1 ]] is 

B ± = {ve ¥ n [z, z- 1 ] | (w, v) = 0, Vw G B}. (5.4) 

The relation between these two annihilator operations is given by: 



14 



Theorem 5.4 If C C F n [2;,z _1 ] a convolutional code with generator matrix G(z), then 
C 1 - is a linear, left-shift-invariant and complete behavior with kernel representation P{z) = 
G t (z). Conversely, if B C F n [[z, z -1 }} is a linear, left-shift-invariant and complete behav- 



ior with kernel representation P(z), then B 
G{z) = P\z). 



is a convolutional code with generator matrix 



Remark 5.5 An elementary proof of Theorem |5.4| in the case of the positive time axis Z H 
is given in E2 . 



Remark 5.6 Theorem :k4 is a special instance of a broad duality theory between solution 
spaces of difference equations on the one hand and modules on the other, for which probably 
the most comprehensive reference is Oberst [57]. In this article Oberst [37, p. 22] works 
with a bilinear form which is different from ( |5.2| ). This bilinear form induces however the 
same duality as shown in [fTB"|. Extensions of duality results to group codes were derived by 



Forney and Trott in |12 

For finite support convolutional codes in the sense of Definition |D| or Definition [D] the 
crucial issue is observability. In the literature there have been several definitions of observ- 
ability [|, [II], H, RL RD], and it is not entirely clear how these definitions relate to each 
other. 

In the sequel we will follow [|], |4^| . 

Definition 5.7 (cf. [|], Prop. 2.1]) A code C is observable if there exists an integer N such 
that, whenever the supports of v and v' are separated by a distance of at least N and 
v + v ' G C, then also v G C and v' G C. 



With this we have the 'Pontryagin dual statement' of Theorem [18 



Theorem 5.8 (cf. j^, Prop. 2.10]) Let G(z) be a n x k matrix of rank k defined over 
¥[z, z \. The following conditions are equivalent: 

(i) The convolutional code C = colsp F [ 2 tZ -i]G(z) is observable. 

(ii) G(z) is right prime over ¥[z, z^ 1 ]. 

(iii) The code C has a kernel representation. This means there exists an r x n 'parity-check 
matrix' H(z) defined over ¥[z, z~ l ] such that 



C = { v{z) G ¥ n [z,z 



-ii 



H(z)v(z) 



0}. 



15 



Remark 5.9 The concept of observability is clearly connected to the coding concept of 
non-catastrophicity. Indeed an encoder is non-catastrophic if and only if the code generated 
by this encoder is observable. In the context of Definition [A] (respectively Definition [A]) 
every code has a catastrophic as well as a non-catastrophic encoder. In the module setting 
of Definition OJ every encoder of an observable code is non-catastrophic and every encoder of 
an non-observable code is catastrophic. If one defines a convolutional code by Definition [D] 
then one could talk of a 'non-catastrophic convolutional code'. The term observable seems 
however much more appropriate. 

As at the end of Section |], we now define the code parameters. We do it only for codes 
given by Definition ID] and leave it to the reader to adapt the definitions to codes given by 



Definition [Dj. 

Assume that G(z) is an n x k polynomial matrix of rank k defining the code C = 
colsp F [ z ]G(;z). There exists a unimodular matrix U(z) such that G(z) = G(z)U(z) is column- 
reduced with ordered column degrees K\ > ... > k^. The indices k = (/c 1; . . . , k&) are 
invariants of the code C, which we call the Kronecker indices or controllability indices of C. 
The invariant 5 := YH=i K « i s called the degree of the code C. The free distance of the code 
is defined as in (|2.3[). Finally we say that C has transmission rate -. 



6 First-order representations 



In this section we provide an overview of the different first-order representations (realizations) 
associated with the convolutional codes and encoding maps which we have defined. 

We start with the encoding map (|2.2| ). As is customary in most of the coding literature, 
we view the map (|2.2| ) as an input-output operator from the message space to the code space. 
The existence of associated state spaces and realizations can be shown on an abstract level. 
Kalman 122, [23fl first showed how the encoding map (127 



can be 'factored' resulting in a 
realization of the encoding matrix <p. Fuhrmamnn |13[] refined the realization procedure in 
an elegant way. (Compare also |L5|, |17|.) 

In the sequel we will simply assume that a realization algorithm exists. We summarize 
the main results in the following two theorems: 

Theorem 6.1 Let T(z) be a p x m proper transfer function of McMillan degree 5. Then 
there exist matrices (A, B, C, D) of size 5 x 5 , 5 x m, p x 5 and p x m respectively such that 



T(z) =C(zI - A)- l B + D. 



(6.1) 



The minimality conditions are that (A, B) forms a controllable pair and (A, C) forms an 
observable pair. Finally fl6.1| ) is unique in the sense that ifT(z) = C(zl — A)~ l B + D with 
(A, B) controllable and (A, C) observable, then there is a unique invertible matrix S such 
that 



(A, B, C, D) = (SAS~\ SB, CS~ , D). 



(6.2) 



16 



Consider the encoding map (|2.2 ) with generator matrix G(z). Let m(z) = Xli= s m « ;zi ^ 
F fc ((z)) and c(z) = J2i= s °i z% e ^ n (( z )) be the sequence of message and code symbols respec- 
tively. Then one has: 

Theorem 6.2 Assume that G(z) has the property that rankG(O) = k. Then G(z~ l ) is a 
proper transfer function, and by Theorem \6. i| there exist matrices (A, B, C, D) of appropriate 
sizes such that G(z~ r ) = C(zl — A)~ l B + D. The dynamics of ( |2.2| ) are then equivalently 
described by: 

x t+1 = Ax t + Bm u 

c t = C Xt + Dm, ^ 

The realization ( |6.3|) is useful if one wants to describe the dynamics of the encoder G(z) . It 
is however less useful if one is interested in the construction of codes having certain properties. 
The problem is that every code C has many equivalent encoders whose realizations appear 
to be completely different. 



Example 6.3 The encoders 



i-z \ / 1=* 



G(z) = f-4 and G{z 



z-4 / \ («-2)(z+3) 

are equivalent since they define the same code in the sense of Definition [S]. The transfer 
functions G(z~ l ) and G(z~ x ) are however very different from a systems theory point of view. 
Indeed, they have different McMillan degrees, and over the reals the first is stable whereas 
the second is not. The state space descriptions are therefor very different for these encoders. 

This example should make it clear that for the purpose of constructing good convolutional 



codes, representation (|6.3|) is not very useful 



We are now coming to the realization theory of the behaviors and codes of Section ^ 
and ^. We will continue with our algebraic approach. The results are stated for the positive 
time axis Z + , but they hold mutatis mutandis for the time axis Z. 

Theorem 6.4 (Existence) Let P(z) be an r x n matrix of rank r describing a behavior 
B of the form ( f4.2|) with McMillan degree 5. Let k = n — r. Then there exist (constant) 
matrices G, F of size 5x (S+k) and a matrix H of size nx (5 + k) such that B is equivalently 
described by: 

B={w(z)e¥ n [[z]]\3az)e¥ 5+k [[z]]: (aG — F)((z) — 0, w{z) = H((z)} . (6.4) 
Moreover the following minimality conditions will be satisfied: 

(i) G has full row rank; 

(ii) [^-j has full column rank; 
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(iii) [ z H ] is right prime. 



For a proof, see |26|, Thm. 4.3] or |27 



Ll|. Equation ( |6.4|) describes the behavior locally in 
terms of a time window of length 1. The computation of the matrices G, F, H from a kernel 
description is not difficult. It can even be done 'by inspection', i.e., just by rearranging the 
data [41|. The next result describes the extent to which minimal first-order realizations are 



unique. A proof is given in |26], Thm. 4.34]. 



Theorem 6.5 (Uniqueness) The matrices (G,F,H) are unique in the following way: If 
(G, F, H) is a second triple of matrices describing the behavior B through ( |6.4|) and if the 
minimality conditions (i), (ii) and (iii) are satisfied, then there exist unique invertible ma- 
trices S and T such that 



(G, F, G) = (SGT-\SFT-\HT- 1 ) 



(6.5) 



The relation to the traditional state-space theory is as follows: Assume that P{z) can be 
partitioned into P(z) = (Y(z) U(z)) with U(z) a square r x r matrix and degdet U(z) = 5, 
the McMillan degree of the behavior B. Assume that (G, F, H) provides a realization for B 
through ( |6.4| ). Then one shows that the pencil [ 2< ~^ F ] is equivalent to the pencil: 



zh-A 

C 



B 

h: 

D 



(6.6) 



The minimality condition (iii) simply translates into the condition that (A, C) forms an 
observable pair, showing that the behavior B is observable. One also verifies that the matrices 
(A, B,C, D) form a realization of the proper transfer function U{z)^ 1 Y{z) and that this is 
a minimal realization if and only if (A, B) forms a controllable pair. Finally (A, B) is 
controllable if and only if the behavior B is controllable. 

The Pontryagin dual statements of Theorem |6.4| and |6.5| are (see ||42|| ) : 



Theorem 6.6 (Existence) Let G(z) be an n x k polynomial matrix generating a rate - 
convolutional code C C W n [z] of degree 5. Then there exist (5 + n — k) x 5 matrices K, L and 
a (5 + n — k) x n matrix M (all defined over ¥) such that the code C is described by 

C = {v(z) G ¥ n [z] | 3x(z) G ¥ s [z] : zKx(z) + Lx(z) + Mv(z) = 0} . (6.7) 

Moreover the following minimality conditions will be satisfied: 

(i) K has full column rank; 

(ii) [K M] has full row rank; 

(iii) [zK + L | M] is left prime. 

Equation ( |6.7| ) describes the behavior again locally in terms of a time window of length 1. 
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Theorem 6.7 (Uniqueness) The matrices (K,L,M) are unique in the following way: If 
(K, L, M) is a second triple of matrices describing the code C through (|6.7| ) and if the min- 
imality conditions (i), (ii) and (Hi) are satisfied, then there exist unique invertible matrices 
T and S such that 



(K, L, M) = (TKS-\TLS-\TM). 



(6.8) 



If G{z) can be partitioned into G(z) 



Y(z) 
U(z) 



with U(z) a square k x k matrix and 



deg det U(z) = 5, the degree of the code C, then the pencil [zK + L \ M] is equivalent to 
the pencil: 



zls — A Osx(n-k) —B 
—C I n -k —D 



(6.9) 



The minimality condition (iii) then translates into the condition that (A, B) forms a con- 
trollable pair, showing that the code C is controllable. One also verifies that the matrices 
(A, B,C, D) form a realization of the proper transfer function Y(z)U(z)~ 1 , that this is a 
minimal realization if and only if (A, C) forms an observable pair, and that this is the case 
if and only if the code C is observable. Finally, the Kronecker indices of C coincide with the 



controllability indices of the pair (A, B) p4 



The systems-theoretic meaning of the representation 
tion the code vector viz) into: 



is as follows (see 44 ). Parti- 



v(z) 



u{z) 



e ¥ n [z] 



and consider the equation: 



zl s -A 
-C 



6x(n—k) 
In—k 



-B 
-D 



x(z) 

2/0) 
u(z) 



(6.10) 



Let 



x{z) 
u(z) 



XqZ 1 + XiZ 1 1 + 

UqZ 1 + UiZ 1 ^ 1 + 
,7-1 



+ x 7 ; x t G F , t = 0, . 
+ u 7 ; u t G ¥ k ,t = 0, . 
yazi + y^-' + ... + j/ 7 ; y t G F"- fc , t = 0, 



J, 
>7, 



.7- 



Then fl6.10 ) is satisfied if and only if 



yt 

v t 



Ax t + Bu t , 
Cx t + Du t , 
Vt 



Xo 



0, x 7 +i 



(6.11) 
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is satisfied. Note that the state-space representation ( 6.11j ) is different from the representa- 
tion ( |6.3| ). Equation fl6.ll ) describes the dynamics of the systematic and rational encoder 



G(z)U~ 



Y(z)U(z) 



The encoding map u(z) i— ► y(z) = G(z)?7~ 1 (2;)m(z) is input-restricted, i.e. u(z) must be in 
the column module of U(z) in order to make sure that y(z) and x(z) have finite support. In 
terms of systems theory, this simply means that the state should start at zero and return 
to zero in finite time. Linear systems satisfying these requirements have been studied by 
Hinrichsen and Pratzel-Wolters IlR [19 



7 Differences and similarities among the definitions 

After having reviewed these different definitions for convolutional codes, we would like to 
make some comparison. 

The definitions of Section § and Section ||] viewed convolutional codes as linear, time- 
invariant, controllable and observable behaviors, not necessarily complete. Definition |C] and 
Definition |C] were more general in the sense that non-controllable behaviors were accepted 
as codes. Definition |B| and Definition |D] were more general in the sense that non-observable 
codes were allowed. 

In the following subsection we show that all definitions are equivalent for all practical 
purposes if one restricts oneself to controllable and observable codes. 

7.1 Controllable and observable codes 

Consider a linear, time-invariant, complete behavior B C F n [[,2, z" 1 ]], i.e. a convolutional 
code in the sense of Definition 0. Let 

C :=Bn¥ n {{z)). 

Then one has 

Theorem 7.1 C is a convolutional code in the sense of Definition and its completion C 
is the largest controllable sub-behavior of B. Moreover, one has a bijective correspondence 
between controllable behaviors B C F n [[z,2;~ 1 ]] and convolutional codes C C W n ((z)) in the 
sense of Definition [A]. 

Sketch of Proof: Let B = ker P(a) = {w(z) e F^,^ 1 ]] | P(a)w(z) = 0}. If B is not 
controllable, then P(z) is not left prime and one has a factorization P(z) = V(z)P(z), 
where P{z) is left prime and describes the controllable sub-behavior ker P(o~) C B. Since 
ker V(a) is an autonomous behavior it follows that 

C = Bn¥ n ({z)) = ker P(a) nF"((z)) = ker P(a) n F n ((z)). 

It follows (compare with Theorem |3.7| ) that the completion C = kerP(a). □ 
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Consider now a convolutional code C C ¥ n ((z)) in the sense of Definition [A*. Define: 

C := Cn¥ n [z,z~ 1 } 
6 := Cn¥ n [z}. 



Conversely if C C ¥ n [z] is a convolutional code in the sense of Definition |D], then define: 

C := span F[Z)Z _i ] {w(2) | v(z) e C}. 
C : = span F((2)) {f (z) \ v(z) G C}. 

By definition it is clear that C cC are convolutional codes in the sense of Definition [D| and 
Definition |A] respectively. 

Theorem 7.2 Assume that C C ¥ n ((z)) is a convolutional code in the sense of Defini- 
tion Then C C W n [z] is an observable code in the sense of Definition \A~]. Moreover the 
operations " and" induce a bijective correspondence between the observable codes C C ¥ n [z] 
and convolutional codes C C ¥ n ((z)) in the sense of Definition \A~]. 



Theorem |7.2| is essentially the Pontryagin dual statement of Theorem |7.1| ; we leave it 



to the reader to work out the details. Theorem |7.1| and |Z. 2| together show that there is a 
bijection between controllable and observable codes in the sense of one definition and another 
definition. For controllable and observable codes the code parameters like the rate k/n, the 
degree 5 and the Forney (Kronecker) indices are all the same. Moreover the free distance 
is in every case the same as well. For all practical purposes one can therefore say that the 
frameworks are completely equivalent, if one is only interested in controllable and observable 
codes. 

The advantage of Definition (respectively Definition [D]) over the other definitions lies 
in the fact that non-observable codes become naturally part of the theory. It also seems 
that for construction purposes the relation between quasi-cyclic codes and convolutional 



codes [^3], [46] is best described in a module-theoretic framework. 

Definition (respectively Definition |C]) allows one to introduce non-controllable codes 
in a natural way. 

A Laurent series setting as in Definition [A] seems to be most natural if one is interested 
in the description of the encoder and/or syndrome former. Extensions of the Laurent series 
framework to multidimensional convolutional codes is however much less natural than the 
polynomial framework, which is why the theory of multidimensional convolutional codes has 



mainly been developed in a module-theoretic framework B [48]. [49 . 



7.2 Duality 



In (|5.2| ) we introduced a bilinear form which induced a bijection between behaviors B C 
F n [[2;,z -1 ]] and modules C C F n [z,2; _1 ]. This duality is a special instance of Pontryagin 



duality, and generalizes to group codes |12] and multidimensional systems |37l . 
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In this subsection we show that the bilinear form ( |5.2|) can also be used to obtain a duality 
between modules and modules (both in W n [z, z^ 1 ]) or between behaviors and behaviors (both 
in F n [[z,^ 1 ]]). 

For this let C C ¥ n [z, z~ x \ be a submodule. Define: 

C 1 ":=C- L nF n [z,z- 1 ]. (7.1) 

One immediately verifies that C h is a submodule of ¥ n [z, z~ l \, which necessarily is observable. 
One always has C C (C h ) h . 

One can do something similar for behaviors. For this let B C F ra [[2:, z -1 ]] be a behavior. 
Define: 

B h := (& n¥ n [z, z' 1 )) 1 - = B I . (7.2) 

Then it is immediate that £> h is a controllable behavior, (B h ) h C B and {B h ) h describes the 
controllable sub-behavior of B. 

It is also possible to adapt fl5.2|) for a duality of subspaces C C ¥ n ((z)). For such a 
subspace we define: 

& := (CnF n [^^- 1 ]) ± nF n ((^)). (7.3) 

The duality ( |7. 1| ) does not in general correspond to the linear algebra dual of the R = 
¥[z, z- 1 ] module C C R n since there is some 'time reversal' involved. The same is true for 
the duality ( |7.3|) , which does not correspond to the linear algebra dual of the F((z)) vector 
space C without time reversal. 

If one works however with the 'time-reversed' bilinear form: 

[,]: ¥ n [[z, z^ 1 ]] x ¥ n [z, z^ 1 ] — ► F 

00 (7 4) 

(w(z),v(z)) ^ 22 ( w h v ~i) ' 

i=—oo 

then the definitions ( |7. 1| ) and ( |7.3| ) do correspond to the module dual (and the linear algebra 
dual respectively), used widely in the coding literature |38[. In this case one has: If G(z) is 
a generator matrix of C h then H(z) := G t (z) is a parity check matrix of (C h ) h . 

In the Laurent- series context it is also possible to induce the duality ( [7.3|) directly through 
the time-reversed bilinear form defined on F n ((z)) x ¥ n ((z)): 

[, ] : ¥ n ((z)) x F n ((z)) — ► F 

°° (7 5) 

(w(z),v(z)) ^ (wi,v-i). v ' ; 



Note that the sum appearing in ( |7.5|) is always well defined. This bilinear form has been 



widely used in functional analysis and in systems theory [14" 
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7.3 Convolutional codes as subsets of ¥[[z, z a case study. 

In this subsection we illustrate the differences of the definitions in the peculiar case n = 1. 

If one works with Definition |A| or Definition || then there exist only the two trivial codes 
having the lxl generator matrix (1) and (0) as subsets of ¥[[z, z^ 1 ]]. 

The situation of Definition |Cj is already more interesting. For each polynomial p(z) one 
has the associated 'autonomous behavior': 



B = {w(z) | p(a)w(z) = } . (7.6) 

Autonomous behaviors are the extreme case of uncontrollable behaviors. If deg p(z) = 5, 
then B is a finite-dimensional F-vector space of dimension 5. For coding purposes B is not 
useful at all. Indeed, the code allows only 5 symbols to be chosen freely, say the symbols 
wo,w\, . . . ,ws~i- With this the codeword w(z) = J2il-oo w i zl e & is determined, and the 
transmission of w(z) requires infinite symbols in the past and infinite symbols in the future. 
In other words, the code has transmission rate 0. The distance of the code is however very 
good, namely df ree (B) = oo. If B is defined on the positive time axis, i.e. B C ¥[[z]\ then the 
situation is only slightly better. Indeed in this situation, one sends first S message words and 
then an infinite set of 'check symbols'. As these remarks make clear, a code of the form (|7.6| ) 
is not very useful. 

The most interesting situation happens in the setup of Definition [D| and Definition |T. 
In this situation the codes are exactly the ideals < g(z) > C ¥[z, z^ 1 } (respectively < g(z) > 
C ¥[z}). We now show that ideals of the form < g(z) > are of interest in the coding context. 



Example 7.3 Let F = F 2 = {0, 1}. Consider the ideal generated by g(z) — (z + 1). 
< g{z) > C ¥[z, z~ l ] consists in this case of the even-weight sequences, namely the set of all 
sequences with a finite and even number of ones. This code is controllable but not observable. 

Ideals of the form < g(z) > are the extreme case of non-observable behaviors. In principle 
this makes it impossible for the receiver to decode a message. However with some additional 
'side-information' decoding can still be performed, as we now explain. 

One of the most often used codes in practice is probably the cyclic redundancy check 
code (CRC code). These codes are the main tool to ensure error-free transmissions over the 
Internet. They can be defined in the following way: Let g(z) G ¥[z] be a polynomial. Then 
the encoding map is simply defined as: 

(p : ¥[z] — > ¥[z], m(z) i — > c{z) = g(z)m(z). (7.7) 

The code is then the ideal < g(z) >= im (<£?). The distance of this code is 2, since there 
exists an integer iV such that (z N — 1) e < g(z) >. As we already mentioned the code is not 
observable. Assume now that the sender gives some additional side information indicating 
the start and the end of a message. This can be either done by saying: "I will send in a 
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moment 1 Mb" , or it can be done by adding some 'stop signal' at the end of the transmission. 
Once the receiver knows that the transmission is over, he applies long division to compute 

c(z) — fh(z)g(z) + r(z), degr(z)<<5. 

If r(z) = the receiver accepts the message m(z) as the transmitted message m(z). Other- 
wise he will ask for retransmission. 

The code performs best over a channel (like the Internet) which has the property that 
the whole message is transmitted correctly with probability p and with probability 1 — p 
whole blocks of the message are corrupted during transmission. One immediately sees that 
the probability that a corrupted message fh(z) is accepted is q~ s , where q — |F| is the field 
size. 

One might argue that the code < g(z) > = im ((p) is simply a cyclic block code, but this 
is not quite the case. Note that the protocol does not specify any length of the code word 
and in each transmission a different message length can be chosen. In particular the code 
can be even used if the message length is longer than N, where N is the smallest integer 
such that (z N - 1) G < g(z) >. 

Example 7.4 Let F = F 2 = {0, 1} and let g(z) = z 20 + 1. Assume transmission is done on 
a channel with very low error probability where once in a while a burst error might happen 
destroying a whole sequence of bits. Assume that the sender uses a stop signal where he 
repeats the 4 bits 0011 for 100 times. Under these assumptions the receiver can be reasonably 
sure once a transmission has been complete. The probability of failure to detect a burst error 
is in this case 2~ 20 which is less than 10~ 6 . Note that g(z) is a very poor generator for a 
cyclic code of any block length. 



Remark 7.5 CRC codes are in practice often implemented in a slightly different way than 
we described it above (see e.g. |36]]). The sender typically performs long division on z 5 m(z) 
and computes 

z s m(z) = f(z)g(z) + r{z), degr(z) < 5. 

He then transmits the code word c(z) := z s m(z) —r{z) G < g(z) >. Clearly the schemes are 
equivalent. The advantage of the latter is that the message sequence m(z) is transmitted in 
'plain text', allowing processing of the data immediately. 



7.4 Some geometric remarks 

One motivation for the author to take a module-theoretic approach to convolutional coding 
theory has come from algebraic-geometric considerations. As is explained in |31], |39|, fHJ, a 
submodule of rank k and degree 5 in ¥ n [z] describes a quotient sheaf of rank k and degree S 
over the projective line P 1 . The set of all such quotient sheaves having rank k and degree at 
most 5 has the structure of a smooth projective variety denoted by Aj? n . This variety has 
been of central interest in the recent algebraic geometry literature. In the context of coding 
theory, it has actually been used to predict the existence of maximum-distance-separable 
(MDS) convolutional codes [13 . 
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The set of convolutional codes in the sense of Definition |A] or |A] or [B| having rate - and 
degree at most 5 form all proper Zariski open subsets of Xf. n . The points in the closure of 
these Zariski open sets are exactly the non-observable codes if the rate is -. These geometric 
considerations suggest that non-observable convolutional codes should be incorporated into 
a complete theory of convolutional codes. The following example will help to clarify these 
issues: 



Example 7.6 Let 8 — 2, k — 1 and n 

2 then has an encoder of the form: 



G{z) 



2, i.e., consider X\ 2 . Any code of degree at most 



ao + CL\Z + a 2 Z z 



b + biz + b 2 z 2 



We can identify the encoder through the point (a , ai, a 2 , b , 6 1; b 2 ) G P 5 . The variety Xf 2 
is in this example exactly the projective space P 5 . For codes in the sense of Definition [A| 
or [A] or [B], G(z) must be taken as a basic minimal encoder in order to have a unique 
parameterization. This requires that g\{z) and g 2 (z) are coprime polynomials. The set of 
coprime polynomials gi(z),g 2 (z) viewed as a subset of P 5 forms a Zariski open subset f/cP 5 
described by the resultant condition 



det 





«0 





bo 


o \ 




a i 
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bo 






a i 


b 2 









«2 





b 2 ) 



^0. 



For codes in the sense of Definition [D], we require that ao and bo are not simultaneously zero 
in order to have a unique parameterization. Definition leads to a larger Zariski open set 



V, i.e. U C V C P 5 . Only with Definition O] does one obtain the whole variety X\ 



P 5 . 



In the general situation X kn naturally contains the non-observable codes as well. If 
k — 1, then Xf n = p n ( 5 + 1 ) _1 ; and the codes in the sense of Definition |D] having rate ^ and 
degree at most S are exactly parameterized by Xf n . 



8 Conclusion 

The paper surveys a number of different definitions of convolutional codes. All definitions 
have in common that a convolutional code is a subset C C F n [[z, z^ 1 ]] which is both linear and 
time-invariant. The definitions differ in requirements such as controllability, observability, 
completeness and restriction to finite support. 

If one requires that a code be both controllable and observable, then the restriction to 
any finite time window will result in equivalent definitions. Actually Loeliger and Mittel- 
holzer |3(| define a convolutional code locally in terms of one trellis section and they require 



in their definition that a code is controllable and observable. Algebraically such a trellis 
section is simply described through the generalized first order description (|6.4f ) or (|6.7p . 
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If one wants to have a theory which allows one to work with rational encoders, then it will 
be necessary that the code has finite support on the negative time axis Z_ (or alternatively 
on the positive time axis Z + ). This is one reason why a large part of the coding literature 
works with the field of formal Laurent series. 

If one wants in addition to have a theory which can accommodate non-observable codes 
(and such a theory seems to have some value) then it is best to work in a module-theoretic 
setting. 
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